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Abstract — The performance of bit-interleaved coded modula- 
tion (BICM) with bit shaping (i.e., non-equiprobable bit probabil- 
ities in the underlying binary code) is studied. For the Gaussian 
channel, the rates achievable with BICM and bit shaping are 
practically identical to those of coded modulation or multilevel 
coding. This identity holds for the whole range of values of signal- 
to-noise ratio. Moreover, the random coding error exponent of 
BICM significantly exceeds that of multilevel coding and is very 
close to that of coded modulation. 



I. Introduction 

For non-binary modulations in the Gaussian channel, three 
main constructions for coding schemes achieve information 
rates close to the channel capacity are known: coded mod- 
ulation (CM), bit-interleaved coded modulation (BICM), and 
multilevel coding (MLC). CM dates back to the pioneering 
work of Ungerbock [1], and merges coding and modulation in 
a single entity. In contrast, BICM separates them, and is built 
around the mapping of a simple binary code onto a non-binary 
modulation |2|, [3|, [4]. MLC makes use of a layer of binary 
codes, one for each bit in the binary label of the modulation 
symbol 0, @. 

CM allows for the highest information rates. It is closely 
followed by multilevel coding (for equiprobable modulation 
symbols the rates coincide) and, with a larger loss, by BICM. 
In terms of error exponents, the situation is somewhat reversed, 
with CM again the best, but now BICM beats multilevel 
coding at low rates. Whereas previous analyses in the literature 
assume that the modulation symbols are used equiprobably, 
in this work we lift this assumption and consider shaping, 
whereby the bit or symbol probabilities are arbitrary. We 
will see that BICM with shaping achieves both information 
rates and error exponents very close to those of CM, thus 
closing the gap which made multilevel coding better in terms 
of information rates. 

This paper is organized as follows. In Sect. [II] we introduce 
the necessary concepts and notation describing the various 
schemes. In Sect. [HI] we derive the achievable rates for BICM 
with shaping by using mismatched decoding theory. These 
results are particularized for the Gaussian channel in Sect. IV 
which also includes some numerical results. 



II. Preliminaries 
A. Blockwise Coded Transmission 

Consider a memoryless channel with input X and output 
Y, respectively belonging to the sets X and y. A block 
code A4 C X N is a set of \M.\ vectors (or codewords) 
x of length N (the number of channel uses), i. e. x = 
(x!,...,x N ) e X N . The output y = (yi,...,y N ) is a 
random transformation of the input with transition probability 
distribution Py\x{u\ x )- For memoryless channels the distri- 
bution Py\x(d\ x ) admits the decomposition 



JY 



Py\x{v\x) = J[Py\x{v. 



k\Xk) 



(1) 



k=l 



With no loss of generality, we limit our attention to contin- 
uous output and identify Py\x{d\x) as a probability density 
function. We adopt the convention that capital letters repre- 
sent random variables, while the corresponding small letters 
correspond to realizations of the variables. 

At the source, a message m drawn with equal probability 
from a message set M. is mapped onto a codeword x. We 
denote this encoding function by <fi, i. e. <fi(m) = x. The 
corresponding transmission rate R is given by R = jr log \M\. 
At the receiver, the decoder determines the codeword decoding 
metric, denoted by q(x, y), for all codewords, and outputs the 
message m whose metric is largest, 



fn = argmax q(<fi(m),y). 

m£{l,...,\M\} 



(2) 



The metrics we consider are products of symbol decoding 
metrics q(x, y), namely (with some abuse of notation) 



N 



(3) 



fe=i 
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For maximum likelihood (ML) decoders, the decoding 
metric is given by q(x,y) — Py\x{v\ x )- More generally, a 
decoder finds the most likely codeword as long as the metric 
q(x, y) is a strictly increasing bijective function of the channel 
transition probability Py\x (y\ x )- Instead, if the metric q(x, y) 
is not a bijective function of the channel transition probability, 
we have a mismatched decoder Q, lHJ. 

Of special interest is the random ensemble correspond- 
ing to CM, for which 1) the channel inputs are selected 
independently for each codeword component according to a 



probability distribution Px(x), and 2) the decoder uses the ML as if the to bits in a symbol were independent, i.e., 



metric. In this case, and for practical reasons, the modulation 
set X is taken finite. Let M = \X\ denote the cardinality of 
X and to = log 2 M the number of bits required to index a 
symbol. The largest information rate that can be achieved with 
CM under the constraint x £ X is is 



C cm = sup I(X;Y). 

Px(X) 



(4) 



Moreover, for any input distribution Px(X), the block error 
probability P e satisfies (9) 



P e < e ~ NE *W 
where E r (R) = sup 0<(9<1 E n (p) — pR and 



(5) 



£ (p) = -logE 



Vp t r ) f PY\x(Y\x') y 



+ p 



(6) 

The expectation is carried out according to the joint distribu- 
tion Px,r( x ,y) = Py\x{v\x)Px(x). 

B. Bit-Interleaved Coded Modulation 

In practical CM schemes, since the codewords are selected 
elements of X and the alphabet X has typically more than 
2 elements, the corresponding codes are in some sense non- 
binary. BICM is a different construction where the underlying 
code is binary. Originally analyzed in [ 3 1 under the assumption 
of infinite-depth interleaving, this restriction was recently 
lifted in [4|, 1 10], where it was shown that BICM has a natural 
description in terms of mismatched decoding. 

The BICM encoder generates a vector of miV bits, b = 
(pi, . . . , b m x), i. e. <fi(m) — b. This vector is mapped onto a 
vector of N modulation symbols according to a labeling rule 
p : F™ X, such that 



Xk — ^(b(k-l)m,+l, ■ ■ ■ : &(fe-l)m+l)j k = l, 



,N. (7) 



Note that the interleaver which gives its name to BICM has 
been absorbed in this description of the encoder. Analogously, 
we denote the inverse labeling by bj, so that bj(x) is the j- 
th bit in the binary label of modulation symbol x, for j = 
1, . . . , to. By construction, the modulation symbols x are used 
with probabilities 



P x icm (x) = Y[P Bj (b 1 (x)). 



(8) 



In addition to the different code construction, BICM also 
differs from CM at the receiver side. First, let us define the 
sets Xi as those elements of X having bit b in the j-th label 

position, i.e., X ] h = {x G X : bj(x) = b}. The BICM symbol 
metric combines the to bit metrics qj(bj,y) given by 



1j 



(b 3 {x) =b,y)= J2 PY\x(y\x')P x icm (x'), (9) 



Hence, the BICM receiver uses the following symbol metric 



j'=l 



(11) 



C. Multilevel Coding 

Multilevel codes (MLC) combined with multistage decoding 
(MSD) have been proposed J5), J6) as an efficient method 
to attain the channel capacity by using binary codes. For 
BICM, a single binary code C is used to generate a binary 
codeword, which is used to select modulation symbols by a 
binary labeling function p. In MLC, the input binary code C 
is the Cartesian product of m binary codes of length N, one 
per modulation level, i. e. C = C\ X . . . X C m , and the input 
distribution for the symbol x(b\, . . . , bj) has the form 



pralc 



X 



(,:) 



Y[P Bj (bj)- d2) 

i=i 

For a fixed input distribution on the bits, MLC achieve the 
mutual information J5], @ both with ML joint decoding and 
with multistage decoding. The largest information rate that can 
be achieved with MLC under the constraint x £ X is 



C mlc = sup I(X;Y). (13) 

P Bl (Bi),...,P Bm (B m ) 

The error exponents of MLC with multistage decoding were 
derived in (Tfl, [12), 01, Q31, where it was also shown the 
error exponent is upper bounded by one. 

III. Achievable Rates with BICM 

For the BICM scheme described above, it was shown in 01, 
ifTUll that the rate 



sup 

s>0 



E 



log 



£E a <nr=i*( 6 j(* , ).* r ) 



(14) 

also named generalized mutual information (GMI), is achiev- 
able with equiprobable bits, Ph. (6) = \- The proof is based 
on a simple extension of Gallager's analysis of ML decoding 
in terms of error exponents to mismatched decoding JT), lf8l . 
References 01, ifTUl also show that the above rate may be 
decomposed as the sum of to bit GMI terms, and that it 
coincides with the BICM capacity defined in [3]. The next 
result generalizes this result for arbitrary bit probabilities. 

Theorem 1: The generalized mutual information of the 
BICM mismatched decoder is equal to the sum of the gen- 
eralized mutual informations of m binary-input channels, 



Re- 




log 



q 3 (B 3 .YY 



(15) 



The expectation is carried out according to the joint distribu- 
tion /'/■ ib.iI'Jn h,i. with 



Pj(y\b)= E 



a Py\x 



(y\x)P x ic ™(x) 



xex L 

An alternative expression is 



Epbicm 



(X>) 



(16) 



R„ 



= sup J2 E 



s>0 



log 



(17) 



where the expectation is done according to the joint distribu- 
tion P$™(x)P Ylx (y\x). 

Proof: For fixed s and probabilities P^ lcm (a;) = 
UT=i P Bj (p (x)) the GMI can be written as 



R S mi(s) = E 



= E 



log 

log 



>(x,y) s 



(18) 



(19) 



where the expectation is carried out according to 
P$™(x)P Ylx (y\x). 

We now have a closer look at the denominator in the 
logarithm of ([19). The key observation is that the sum over 
the constellation points (x' £ X) of the product of a function 
f(bj(x')) evaluated at all the binary label positions admits an 
alternative expression, namely 



E n/(w) =n e 



(20) 



Indeed, after carrying out the product in the right-hand side, 
we obtain the desired sum over all 2 m binary m-tuples 
(&i, . . . , b m ) of summands of the form • • • f(b m ). 

Therefore, for the specific choice f(bj(x')j = 
qj(bj(x'),Y) Pfl.(6j-(x')) we have the product over all 
label positions of the sum of the probabilities of the bit bj 
being zero and one, i.e., 



x'ex \j=i 

m I 

=n e h(wpbM) 

3=1 \6'e{0,l} 

Next, going back to \\9) , we obtain 



(21) 



(22) 



i? g mi(s) = E 



l0 § n^r 



fAY,v =0 <ii{VpYyp Bj {v j ) 



E E 

3=1 



log 



(23) 
(24) 



where the expectation is done according to the joint dis- 
tribution P x lcm (x)P Y \x(y\ x )- This gives Eq. ( fTT) since the 
generalized mutual information is the supremum over all s 
0, (HI. As for Eq. ( fT7| , we derive it by noting that, for each 
j, the summation over x in the expectation can be split into 
two parts and rearranged as follows, 



E/(-)= E E/w 



6je{o,i} x6 ^' 



6^(0,1} 



! E 



(25) 



(26) 



As PBjibj) — J2 X 'ex j Px lcm ( x ') by construction, recovering 
the expression of f(x) we obtain Pj(y\bj) in Eq. ( |T6[ >. ■ 

The following result applies to BICM with the decoding 
metric given in Eq. |9]). 

Corollary 1: For the classical BICM decoder with metric 
in Eq. |9) the supremum over s is achieved at s = 1, and 

Proof: Since the metric qj(bj,y) is proportional to 
Pj(y\bj), we can identify the quantity 



E 



log 



E^=o Y) s P Bj (6J-) 



(27) 



as the generalized mutual information of a matched binary- 
input channel with transitions P/(y|6j). Then, the supremum 
over s is achieved at s = 1 (that is, the mutual information 
I(Bj\Y)) and we get the desired result. ■ 
In the remainder of the paper, for the sake of simplicity and 
without loss of generality, we focus on the classical BICM 
metric. Clearly, the methods and results we present generalize 
to other metrics, in which case, s should also be optimized. 

The above results suggest that we can chose the input 
bit distribution that yields the largest GMI, i.e., effectively 
implying shaping the bit probabilities in BICM as 



P Bl (Bj 



sup 

),-,Pb„ 



I{BfY). 



(28) 



For iid codebooks, C blcm is also the largest rate that can be 
transmitted with vanishing error probability iTPfl . 

This capacity should be compared with the equivalent 
quantities on CM and multi-level coding, given in Eqs. |4]) 
and ( fT3] l respectively, 

C cm = sup I(X;Y), (29) 

Px(X) 

C mlc = sup I(X;Y). (30) 

P Bl (B 1 ),...,P Bm (B m ) 

Note that BICM differs from CM in the transmitter, where 
the modulation symbol probabilities have the specific form 
P x lcm (x) = njli (bj{x)), and the receiver, where the 
symbol metric in Eq. ( fTT| is used for decoding. 

In terms of the random coding error exponent, the analysis 
in ||4), iflOl can be merged with the previous proof to show that 



for any input distribution Px(x), the block error probability 
P e is upper bounded by 



(31) 



where E?(R) = sup < p <i E$(p, 



s>0 



E q (p,s) ^-logE 



pi?, and 

bicm (x',r)~ 



7 bicm 



(32) 

is a generalized Gallager function. The expectation is carried 
out according to the joint distribution Py\x(u\x)Px{x). 

IV. Bit Shaping for the Gaussian Channel 

A. Channel Model 

We consider the transmission over complex-plane signal sets 
(X C C, y — C) in the AWGN channel. It is a memoryless 
channel satisfying 



ft 



snrX k + Z k , 



1, 



.V 



(33) 



where are zero-mean, unit-variance, circularly symmetric 
complex Gaussian samples, and snr is the signal-to-noise ratio 
(SNR). We wish to solve the optimization problems in Eqs. (01, 
( fT3j > and ( |2"8j ) with the additional constraints that x E X, 
E[X] = 0, and E[\X\ 2 } = 1. 

We consider binary reflected Gray mapping For shaping, 
2 m -QAM signal sets are of special interest; this constellation 
is the Cartesian product of two 2tt-PAM constellations, one 
for each of the in-phase and quadrature components of the 
channel. Since the optimum input distribution is known to 
be Gaussian, a good input distribution over the set X should 
approach in some sense a Gaussian density. Symmetry be- 
tween the in-phase and quadrature components and along the 
zero axis (so that the positive and negative plane have equal 
probability) dictate that the optimization problems in Eqs. (HJ) 
and ( [13] > respectively have 

• 2 2 ? -1 — 1 free parameters for CM, and 

• -y — 1 free parameters for BICM and MLC. 

For BICM we used the symmetries of binary reflected Gray 
mapping and the fact that the most significant bit selects the 
positive or negative half -plane, and always has probability i. 

Note that the CM optimization problem does not restrict 
the input distribution to be Px{x) — Iljli -Pb* Q ) j{ x ))i hence 
being able to achieve potentially larger rates. As we shall 
see, the resulting difference in information rates is however 
marginal. Moreover, note that there is an exponential relation- 
ship between the number of free parameters for BICM and 
CM, which can induce rather large computational savings for 
large signal sets. For example, since for 16-QAM there is only 
one free parameter for MLC and CM, the optimization will 

2 Recall that the binary reflected Gray mapping for m bits may be generated 
recursively from the mapping for m — 1 bits by prefixing a binary to the 
mapping for m— 1 bits, then prefixing a binary 1 to the reflected (i. e. listed in 
reverse order) mapping for to— 1 bits. For QAM modulations in the Gaussian 
channel, the symbol mapping is the Cartesian product of Gray mappings over 
the in-phase and quadrature PAM components. 




Fig. 1. Capacities for Gaussian inputs (thick solid), CM/MLC with shaping 
(thin solid), BICM with shaping (dashed) and BICM with equiprobable inputs 
(dotted line) for 16-QAM with Gray mapping and bit shaping as a function 
of^(dB). 



result in the best performance, i.e., MLC is optimal and BICM, 
as we shall see, is very close. However, for m > 4 this is 
no longer true and the optimization over symbol probabilities 
without restriction Px (x) to be the product of bit probabilities 
could potentially yield larger rates. 

B. Numerical Examples 

Figure [T] shows the improvement in BICM capacity de- 
rived from shaping for 16-QAM with binary reflected Gray 
mapping. As we observe C blcm (dashed) is almost indistin- 
guishable from C cm or C mlc (thin solid) or channel capacity 
itself (thick solid). This shows that shaping bits for BICM 
can recover the BICM capacity loss for equiprobable bits and 
effectively close the gap with CM and MLC. Remark that the 
BICM demodulator is a one-shot non-iterative demodulator. 
In general, the decoding complexity of BICM is larger than 
that of MLC, since the codes of MLC are shorter. In practice, 
however, if the decoding complexity grows linearly with the 
number of bits in a a codeword, e. g. with LDPC or turbo 
codes, the overall complexity of BICM becomes comparable 
to that of MLC. 

Figure [2] shows the error exponents for CM and BICM, 
with and without shaping, for 16-QAM at snr = 8 dB. When 
shaping is used, the input distribution is the corresponding 
optimal capacity achieving distribution. We observe that when 
shaping is used, in the region near capacity, the overall BICM 
error exponent is very close to that of CM, while when 
equiprobable bits are used, the exponent deviates from that 
of CM. Remark that, according to 0, HU) the BICM error 
exponent cannot be larger than that of CM, as opposed to 
that of the independent parallel channel model. Furthermore, 
note that the error exponent of BICM is much larger than 
that of MLC (always being given by the minimum of the 
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Fig. 2. Error exponent zoom at capacity for 16-QAM with and without 
shaping for CM (solid) and BICM (dashed) at snr = 8 dB. The error exponent 
of the BICM parallel channel model is shown for comparison (dotted). When 
shaping is used, the input distribution is the corresponding optimal capacity 
achieving distribution. 

error exponents of the various levels, which results in an error 
exponent smaller than 1) (Til, 1321, El, (ED- Therefore, in 
terms of error probability, BICM outperforms MLC. 

C. Wideband Regime 

The gain from shaping in BICM is especially remarkable at 
low snr, the wideband regime recently discussed at length by 
Verdu lfT51 . Following his methodology, rather than studying 
the exact expression of the information rate, one considers a 
second-order Taylor series in terms of snr, 

R(sm) = cisnr + c 2 snr 2 + o(snr 2 ) , (34) 

where the notation o(snr 2 ) indicates that the remaining terms 
vanish faster than a function asnr 2 , for a > and small 
snr. A scheme is said to be first- and second-order optimal 
if ci = 1 and c 2 = — §, as it is for the channel capacity. In 
those conditions, such a system is both power- and bandwidth- 
efficient. For instance, it is well known that for low snr, QPSK 
is both first- and second-order efficient ifTSIl . 

The low-snr performance of BICM was studied in ifTSl . 
where general expressions for the coefficients c\ and c 2 were 
given for general mapping rules and equiprobable signaling. 
For the particular case of binary reflected Gray mapping 
with squared QAM constellations, it was found that BICM 
was suboptimal, in the sense that it did not achieve the 
optimum c\ and c 2 . References [17|, |18| propose alternative 
mapping rules for BICM that achieve c\ = 1, or equivalently 
TvqY = = — 1-59 dB, with equiprobable signaling. 

Incidentally, this disproves the conjecture from [ 3 1 that binary- 
reflected Gray mapping is the optimum labeling rule for BICM 
schemes. However, the mappings of ifTTl . ifTSIl are not second- 
order optimal. 

In the case of non-equiprobable signaling, binary reflected 
Gray mapping becomes optimal both in terms of ci and c 2 . 



Theorem 2: Shaping makes BICM transmission over QAM 
modulations with binary reflected Gray mapping first- and 
second-order optimal, i.e., c% = 1 and c 2 = — i. 

The key fact is that the bit probabilities are such that 
a QPSK constellation is effectively selected. To see how, 
note that for m — 2 we have QPSK with Gray mapping. 
Limiting ourselves to one dimension, the binary reflected Gray 
mapping for ? + 1 bits is constructed from the mapping for 
~ bits by prefixing a binary to the mapping for y — 1 
bits, then prefixing a binary 1 to the reflected (i. e. listed in 
reverse order) mapping for y — 1 bits. With shaping, one has 
the flexibility to fix each of this additional bits to a given 
value, say, 0, so that one is effectively transmitting over a 
BPSK constellation (QPSK over the two quadratures) when 
the resulting constellation is normalized in mean and energy. 
Note that this is property does not necessarily hold for other 
mapping rules. 

References 

[1] G. Ungerbock, "Channel Coding With Multilevel/Phase Signals.," IEEE 
Trans. Inf. Theory, vol. 28, no. 1, pp. 55-66, 1982. 

[2] E. Zehavi, "8-PSK trellis codes for a Rayleigh channel," IEEE Trans. 
Commun., vol. 40, no. 5, pp. 873-884, 1992. 

[3] G. Caire, G. Taricco, and E. Biglieri, "Bit-interleaved coded modula- 
tion," IEEE Trans. Inf. Theory, vol. 44, no. 3, pp. 927-946, 1998. 

[4] A. Guillen i Fabregas, A. Martinez, and G. Caire, Bit-Interleaved Coded 
Modulation, vol. 5, Foundations and Trends on Communications and 
Information Theory, Now Publishers, 2008. 

[5] H. Imai and S. Hirakawa, "A new multilevel coding method using error- 
correcting codes," IEEE Trans. Inf. Theory, vol. 23, no. 3, pp. 371-377, 
May 1977. 

[6] U. Wachsmann, R. F. H. Fischer, and J. B. Huber, "Multilevel codes: 
theoretical concepts and practical design rules," IEEE Trans. Inf. Theory, 
vol. 45, no. 5, pp. 1361-1391, Jul. 1999. 

[7] N. Merhav, G. Kaplan, A. Lapidoth, and S. Shamai (Shitz), "On 
information rates for mismatched decoders," IEEE Trans. Inf. Theory, 
vol. 40, no. 6, pp. 1953-1967, 1994. 

[8] A. Ganti, A. Lapidoth, and I. E. Telatar, "Mismatched decoding 
revisited: general alphabets, channels with memory, and the wideband 
limit," IEEE Trans. Inf. Theory, vol. 46, no. 7, pp. 2315-2328, 2000. 

[9] R. G. Gallager, Information Theory and Reliable Communication, John 
Wiley & Sons, Inc. New York, NY, USA, 1968. 
[10] A. Martinez, A. Guillen i Fabregas, G. Caire, and F. Willems, "Bit- 
interleaved coded modulation revisited: A mismatched decoding per- 
spective," IEEE Trans on Inf. Theory, vol. 55, no. 6, pp. 2756-2765, 
Jun. 2009. 

[11] G. Beyer, K. Engdahl, and K. S. Zigangirov, "Asymptotical analysis 
and comparison of two coded modulation schemes using PSK signaling 

- Part I," IEEE Trans. Inf. Theory, vol. 47, no. 7, pp. 2782-2792, 2001. 
[12] G. Beyer, K. Engdahl, and K. S. Zigangirov, "Asymptotical analysis 

and comparison of two coded modulation schemes using PSK signaling 

- Part II," IEEE Trans. Inf. Theory, vol. 47, no. 7, pp. 2793-2806, 2001. 
[13] A. Ingber and M. Feder, "Capacity and Error Exponent Analysis of 

Multilevel Coding with Multistage Decoding," in IEEE Int. Symp. Inf. 
Theory, Seoul, Korea, Jul. 2009, pp. 1799-1803. 
[14] A. Lapidoth, "Nearest neighbor decoding for additive non-Gaussian 
noise channels," IEEE Trans. Inf. Theory, vol. 42, no. 5, pp. 1520- 
1529, Sept. 1996. 

[15] S. Verdu, "Spectral efficiency in the wideband regime," IEEE Trans. 
Inf. Theory, vol. 48, no. 6, pp. 1319-1343, Jun. 2002. 

[16] A. Martinez, A. Guillen i Fabregas, G. Caire, and F. Willems, "Bit- 
interleaved coded modulation in the wideband regime," IEEE Trans. 
Inf. Theory, vol. 54, no. 12, pp. 5447-5455, Dec. 2008. 

[17] C. Stierstorfer and R. F. H. Fischer, "Asymptotically optimal mappings 
for BICM with M-PAM and M-QAM," IET Electronics Letters, vol. 45, 
no. 3, pp. 173-174, Jan. 2009. 

[18] E. Agrell and A. Alvarado, "On optimal constellations for BICM at low 
SNR," in IEEE Inf. Theory Workshop, Taormina, Italy, Oct. 2009. 



